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Amendments to the Claims 
This listing of claims will replace all prior versions of claims in the application: 
Listing of Claims: 

1 . (Currently Amended) A system that facilitates extracting data in connection with 
spam processing, comprising: 

a computer readable storage medium comprising: 

a component that receives a message and extracts a set of features 
associated with some part, content or content type of a message; and 

an analysis component that at least examines consecutiveness of 
characters within a subject line of the message or at least examines a content type 
of the message for spam in connection with building a filte r, wherein the content 
type is case-sensitive, comprises primary content-type and a secondary-content 
type, or combinations thereof . 

2. (Original) The system of claim 1, the analysis component determines frequency of 
consecutive repeating characters within the subject line of the message. 

3. (Original) The system of claim 2, the characters comprise letters, numbers, or 
punctuation. 

4. (Original) The system of claim 1, the analysis component determines frequency of 
white space characters within the subject line of the message. 

5. (Original) The system of claim 1, the analysis component determines distance 
between at least one alpha-numeric character and a blob. 

6. (Original) The system of claim 1, the analysis component determines a maximum 
number of consecutive, repeating characters and stores this information. 
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7. (Original) The system of claim 1, the analysis component establishes ranges of 
consecutive, repeating characters, the ranges corresponding to varying degrees of spaminess, 
whereby messages can be sorted by their respective individual count of consecutive repeating 
characters. 

8. (Cancelled) 

9. (Currently Amended) The system of claim 1[[8]], the analysis component 
compares the content type of a current message to stored content types of a plurality of other 
messages to facilitate determining whether the message is spam. 

10. (Cancelled) 

11. (Cancelled) 

12. (Original) The system of claim 1, the analysis component further determines time 
stamps associated with the message. 

13. (Original) The system of claim 12, the analysis component determining a delta 
between time stamps. 

14. (Original) The system of claim 13, the delta is between a first and a last time 

stamp. 

15. (Original) The system of claim 1, the analysis component determines at least one 
of: a percentage of white space to non- white space in the subject line of the message and a 
percentage of non- white space and non-numeric characters that are not letters in the subject line 
of the message. 

16. (Original) The system of claim 1, the filter being a spam filter. 
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17. (Original) The system of claim 1, the filter being a parental control filter. 

18. (Original) The system of claim 1, further comprising a machine learning system 
component that employs at least a subset of extracted features to learn at least one of spam and 
non-spam. 

1 9 . (Currently Amended) A system embodied on a computer readable storage 
medium that facilitates extracting data in connection with spam processing, comprising: 

a component that receives an item and extracts a set of features indicative of spam 
associated with a message , at least one of the features is a normalized time delta ; and 

an analysis component that determines whether an embedded message or 
attachment is associated with the message. 

20. (Original) The system of claim 19, the analysis component identifies a type of 
embedded message or attachment to facilitate predicting whether the message is spam. 

21. (Original) The system of claim 19, further comprising a component that employs 
at least a subset of the extracted features to populate at least one feature list. 

22. (Original) The system of claim 2 1 , the at least one feature list is any one of a list 
of good users, a list of spammers, a list of positive features indicating legitimate sender, and a list 
of features indicating spam. 

23. (Original) The system of claim 19, further comprising a component that examines 
at least a portion of a message body. 

24. (Original) The system of claim 23, the component examines at least a beginning 
portion of the message body. 
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25. (Original) The system of claim 23, the component determines at least one of: a 
percentage of white space to non- white space in the message body and a percentage of non- white 
space and non-numeric characters that are not letters in the message body. 

26. (Original) The system of claim 23, the component determines a percentage or a 
number of consecutive lines of a message body to examine. 

27. (Original) The system of claim 23, the component examines the message body for 
the presence of at least one blob or consecutive, repeating characters. 

28. (Original) The system of claim 27, the characters comprising letters, punctuation, 
and numbers. 

29. (Currently Amended) A computer-readable storage medium that performs a 
method that facilitates spam detection and prevention , the method comprising: 

receiving a plurality of messages, the plurality comprising at least a first 
and a second message; 

extracting at least a subset of information from the plurality of messages, the 
information being from at least one of a subject line, a content-type header, a received header, 
and a message body; and 

analyzing the subset of information to generate one or more features to facilitate 
training a filter; 

determining time stamps associated with the message; and 

determining a delta between a first time stamp and a last time stamp, the first time 
stamp being located in a Received header and the last time stamp being located in a Date 
header at the message's destination . 

30. (Original) The method of claim 29, analyzing the subset of information comprises 
determining a number of consecutive repeating characters within the subject line or the message 
body of the message. 
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3 1 . (Original) The method of claim 30, the characters comprise letters, numbers, or 
punctuation. 

32. (Original) The method of claim 29, analyzing the subset of information comprises 
determining a frequency of white space characters within the subject line of the message. 

33. (Original) The method of claim 29, analyzing the subset of information comprises 
determining a distance between at least one alpha-numeric character and a blob. 

34. (Original) The method of claim 29, analyzing the subset of information 
comprises: 

determining a maximum number of consecutive, repeating 
characters and storing this information; and 

establishing ranges of consecutive, repeating characters, the ranges 
corresponding to varying degrees of spaminess, whereby messages can be sorted by their 
respective individual count of consecutive repeating characters. 

35. (Original) The method of claim 29, analyzing the subset of information 
comprises: 

determining content type associated with the message; and 
comparing the content type of a current message to stored content types of 
a plurality of other messages to facilitate determining whether the message is spam. 

36. (Cancelled) 

37. (Original) The method of claim 29, analyzing the subset of information comprises 
determining a percentage or a number of consecutive lines of a message body to examine at least 
one of: a percentage of white space to non-white space in the subject line of the message and a 
percentage of non- white space and non-numeric characters that are not letters in the subject line 
of the message. 
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38. (Original) The method of claim 29, analyzing the subset of information comprises 
determining whether an embedded message or an attachment exists in the message and 
identifying a type of embedded message or attachment to facilitate predicting whether the 
message is spam. 

39. (Original) The method of claim 29, analyzing the subset of information comprises 
examining at least a beginning portion of the message body. 

40. (Currently Amended) A computer-readable medium having stored thereon the 
following computer executable components: 

a component that receives a message and extracts a set of features associated with 
some part, content or content type of a message , wherein at least one content type is case- 
sensitive, comprises primary content-type and a secondary-content type, or combinations thereof ; 

an analysis component that examines at least consecutiveness of characters within 
a subject line of the message in connection with building a filter; 

a component that determines a delta between a first time stamp and a last time 
stamp associated with the message, the first time stamp and the last time stamp arc normalized to 
a coordinated universal time; 

a component that determines whether an embedded message or attachment is 
associated with the message; and 

a component that determines a percentage or a number of consecutive lines of a 
message body to examine and that examines the message body for the presence of at least one 
blob or consecutive, repeating characters. 

4 1 . (Currently Amended) A syste m embodied on one or more computers that 
facilitates printing from a web page extracting data in connection with spam processing, 
comprising: 

means for receiving a plurality of messages, the plurality comprising at least a 
first and a second message; 
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means for extracting at least a subset of information from the plurality of 
messages, the information being from at least one of a subject line, a content-type header, a 
received header, and a message body; and 

means for analyzing the subset of information to generate one or more features to 
facilitate training a filter, the means for analyzing the subset of information comprising: 

means for determining a number of consecutive repeating characters within the 
subject line or the message body of the message; 

means for determining a delta between a first time stamp and a last time 
stamp associated with the mcssasgc message, the first time stamp and the last time stamp 
are normalized to a coordinated universal time ; 

means for determining whether an embedded message or an attachment 
exists in the message and identifying a type of embedded message or attachment to 
facilitate predicting whether the message is spam; and 

means for determining a percentage or a number of consecutive lines of a 
message body to examine at least one of: a percentage of white space to non-white space 
in the subject line of the message and a percentage of non-white space and non-numeric 
characters that are not letters in the subject line of the message. 



8 



